Feature selection is a key step in data preprocessing for software defect prediction. Aiming at the problems of existing feature selection methods such as not significant dimension reduction performance and low classification accuracy of selected optimal feature subset, a feature selection method for software defect prediction based on Self-adaptive Hybrid Particle Swarm Optimization (SHPSO) was proposed. Firstly, combined with population partition, a self-adaptive weight update strategy based on Q-learning was designed, in which Q-learning was introduced to adaptively adjust the inertia weight according to the states of the particles. Secondly, to balance the global search ability in the early stage of the algorithm and the convergence speed in the later stage, the curve adaptivity based time-varying learning factors were proposed. Finally, a hybrid location update strategy was adopted to help particles jump out of the local optimal solution as soon as possible and increase the diversity of particles. Experiments were carried out on 12 public software defect datasets. The results show that the proposed method can effectively improve the classification accuracy of software defect prediction model and reduce the dimension of feature space compared with the method using all features, the commonly used traditional feature selection methods and the mainstream feature selection methods based on intelligent optimization algorithms. Compared with Improved Salp Swarm Algorithm (ISSA), the proposed method increases the classification accuracy by about 1.60% on average and reduces the feature subset size by about 63.79% on average. Experimental results show that the proposed method can select a feature subset with high classification accuracy and small size.
Aiming at the problems in Multi-scale Generative Adversarial Networks Image Inpainting algorithm (MGANII), such as unstable training in the process of image inpainting, poor structural consistency, insufficient details and textures of the inpainted image, an image inpainting algorithm of multi-scale generative adversarial network was proposed based on multi-feature fusion. Firstly, aiming at the problems of poor structural consistency and insufficient details and textures, a Multi-Feature Fusion Module (MFFM) was introduced in the traditional generator, and a perception-based feature reconstruction loss function was introduced to improve the ability of feature extraction in the dilated convolutional network, thereby supplying more details and texture features for the inpainted image. Then, a perception-based feature matching loss function was introduced into local discriminator to enhance the discrimination ability of the discriminator, thereby improving the structural consistency of the inpainted image. Finally, a risk penalty term was introduced into the adversarial loss function to meet the Lipschitz continuity condition, so that the network was able to converge rapidly and stably in the training process. On the dataset CelebA, compared with MANGII, the proposed multi-feature fusion image inpainting algorithm can converges faster. Meanwhile, the Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity (SSIM) of the images inpainted by the proposed algorithm are improved by 0.45% to 8.67% and 0.88% to 8.06% respectively compared with those of the images inpainted by the baseline algorithms, and Frechet Inception Distance score (FID) of the images inpainted by the proposed algorithm is reduced by 36.01% to 46.97% than the images inpainted by the baseline algorithms. Experimental results show that the inpainting performance of the proposed algorithm is better than that of the baseline algorithms.
In order to reduce the regression test set and improve the efficiency of regression test in the Continuous Integration (CI) environment, a regression test suite selection method for the CI environment was proposed. First, the commits were prioritized based on the historical failure rate and execution rate of each test suite related to each commit. Then, the machine learning method was used to predict the failure rates of the test suites involved in each commit, and the test suite with the higher failure rate were selected. In this method, the commit prioritization technology and the test suite selection technology were combined to ensure the increase of the failure detection rate and the reduction of the test cost. Experimental results on Google’s open-source dataset show that compared to the methods with the same commit prioritization method and test suite selection method, the proposed method has the highest improvement in the Average Percentage of Faults Detected per cost (APFDc) by 1% to 27%; At the same cost of test time, the TestRecall of this method increases by 33.33 to 38.16 percentage points, the ChangeRecall increases by 15.67 to 24.52 percentage points, and the test suite SelectionRate decreases by about 6 percentage points.
Considering the lack of effective trend feature descriptors in existing methods, financial technical indicators such as Vertical Horizontal Filter (VHF) and Moving Average Convergence/Divergence (MACD) were introduced into power data analysis. An anomaly detection algorithm and a load forecasting algorithm using financial technical indicators were proposed. In the proposed anomaly detection algorithm, the thresholds of various financial technical indicators were determined based on statistics, and then the abnormal behaviors of user power consumption were detected using threshold detection. In the proposed load forecasting algorithm, 14 dimensional daily load characteristics related to financial technical indicators were extracted, and a Long Shot-Term Memory (LSTM) load forecasting model was built. Experimental results on industrial power data of Hangzhou City show that the proposed load forecasting algorithm reduces the Mean Absolute Percentage Error (MAPE) to 9.272%, which is lower than that of Autoregressive Integrated Moving Average (ARIMA), Prophet and Support Vector Machine (SVM) algorithms by 2.322, 24.175 and 1.310 percentage points, respectively. The results show that financial technical indicators can be effectively applied to power data analysis.
The emergence of RAMCloud has improved user experience of Online Data-Intensive (OLDI) applications. However, its energy consumption is higher than traditional cloud data centers. An energy-efficient strategy for disks under this architecture was put forward to solve this problem. Firstly, the fitness function and roulette wheel selection which belong to genetic algorithm were introduced to choose those energy-saving disks to implement persistent data backup; secondly, reasonable buffer size was needed to extend average continuous idle time of disks, so that some of them could be put into standby during their idle time. The simulation experimental results show that the proposed strategy can effectively save energy by about 12.69% in a given RAMCloud system with 50 servers. The buffer size has double impacts on energy-saving effect and data availability, which must be weighed.
The activities of the programmers including copy, paste and modify result in a lot of code clone in the software systems. However, the inconsistent change of code clone is the main reason that causes program error and increases maintenance costs in the evolutionary process of the software version. To solve this problem, a new research method was proposed. The mapping relationship between the clone groups was built at first. Then the theme of lineal cloning cluster was extracted using Latent Dirichlet Allocation (LDA) model. Finally, the inconsistent change probability of code clone was predicted. A software which contains eight versions was tested and an obvious discrimination was got. The experimental results show that the method can effectively predict the probability of inconsistent change and be used for evaluating quality and credibility of software.
An fast image stitching algorithm based on improved Speeded Up Robust Feature (SURF) was proposed to overcome the real-time and robustness problems of the original SURF based stitching algorithms. The machine learning method was adopted to build a binary classifier, which identified the critical feature points obtained by SURF and removed the non-critical feature points. In addition, the Relief-F algorithm was used to reduce the dimension of the improved SURF descriptor to accomplish image registration. The weighted threshold fusion algorithm was adopted to achieve seamless image stitching. Several experiments were conducted to verify the real-time performance and robustness of the improved algorithm. Furthermore, the efficiency of image registration and the speed of image stitching were improved.